Efficient String Mining under Constraints Via the Deferred Frequency Index

نویسندگان

  • David Weese
  • Marcel H. Schulz
چکیده

We propose a general approach for frequency based string mining, which has many applications, e.g. in contrast data mining. Our contribution is a novel algorithm based on a deferred data structure. Despite its simplicity, our approach is up to 4 times faster and uses about half the memory compared to the best-known algorithm of Fischer et al. Applications in various string domains, e.g. natural language, DNA or protein sequences, demonstrate the improvement of our algorithm.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Impact of Pollution Location on Time and Frequency Characteristics of Leakage Current of Porcelain Insulator String under Different Humidity and Contamination Severity

One of the important factors influencing outdoor insulators performance is pollution phenomenon. The pollution, especially during humidity condition, reduces superficial resistance of insulator and lead to a flow of Leakage Currents (LC) on the insulator surface, which may result in total flashover. The LC characteristics are affected by parameters such as nature and severity of pollution. Loca...

متن کامل

Efficient Optimum Design of Steructures With Reqency Response Consteraint Using High Quality Approximation

An efficient technique is presented for optimum design of structures with both natural frequency and complex frequency response constraints. The main ideals to reduce the number of dynamic analysis by introducing high quality approximation. Eigenvalues are approximated using the Rayleigh quotient. Eigenvectors are also approximated for the evaluation of eigenvalues and frequency responses. A tw...

متن کامل

Introducing Softness into Inductive Queries on String Databases

In many application domains (e.g., WWW mining, molecular biology), large string datasets are available and yet under-exploited. The inductive database framework assumes that both such datasets and the various patterns holding within them might be queryable. In this setting, queries which return patterns are called inductive queries and solving them is one of the core research topics for data mi...

متن کامل

Approximate String Similarity Join using Hashing Techniques under Edit Distance Constraints

The string similarity join, which is employed to find similar string pairs from string sets, has received extensive attention in database and information retrieval fields. To this problem, the filter-and-refine framework is usually adopted by the existing research work firstly, and then various filtering methods have been proposed. Recently, tree based index techniques with the edit distance co...

متن کامل

Optimal String Mining Under Frequency Constraints

We propose a new algorithmic framework that solves frequency-related data mining queries on databases of strings in optimal time, i.e., in time linear in the input and the output size. The additional space is linear in the input size. Our framework can be used to mine frequent strings, emerging strings and strings that pass other statistical tests, e.g., the χ-test. In contrast to the presented...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008